AITopics | spurious feature

Collaborating Authors

spurious feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Role of Causal Features in Strategic Classification for Robustness and Alignment

Gois, Antonio, Gunluk, Sophia, Rosenfeld, Nir, Hegde, Nidhi, Lacoste-Julien, Simon, Sridhar, Dhanya

arXiv.org Machine LearningMay-27-2026

AsInstrategic classification, aninstitution(e.g., a bank) anticipates adaptation from userswe develop better algorithms under varying assumpwho change their features to increase utilitytions about adaptation (Levanon and Rosenfeld, 2022; in a classification task (e.g., loan repayment). Kleinberg and Raghavan, 2018), there are growing Since a key challenge is the distribution shiftconcerns about negative social impact on the agents who adapt to these systems, whether outcomes areinduced by users, we turn to causal models, which have been shown to bound the worst-static (Milli et al., 2019) or dynamic (G ois et al., case out-of-distribution (OOD) risk, and es-2025). When agents adapt, depending on the untablish several new results that link causal-derlying causal model (Horowitz and Rosenfeld, 2018; ity and strategic classification. First, we Miller et al., 2020), some changes improve agent outcomes while others constitute gaming the classifier,show that causal classification leads to optimal classification error after any sufficientlyworsening classification error. In this paper, we study large adaptation, when the noise is boundedwhether classifiers can maintain accuracy without sacin a certain way. Second, when these as-rificing alignment with predicted agent's goals.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Machine Learning

2605.27163

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Loans (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

Fokoué, Ernest

arXiv.org Machine LearningMay-26-2026

Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation. This paper argues that causal inference (identifying mechanisms invariant under intervention) is AI's indispensable statistical conscience. Without causal grounding, AI systems are correlation machines: powerful in familiar domains, brittle under distribution shift, and biased in high-stakes settings. Three contributions develop this argument. First, a Statistical Necessity Theorem for Causal Generalization: any algorithm achieving out-of-distribution generalization must encode causal structure, formalizing the distinction between prediction P(Y|X) and intelligence P(Y|do(X)). Second, a unified framework connects Pearl's do-calculus, the Potential Outcomes framework, Double Machine Learning, and Invariant Risk Minimization as a family of Causal Statistical Estimators, each identifying interventional distributions under different assumptions. Third, three AI failure modes (hallucination in large language models, reward hacking in reinforcement learning from human feedback, and degradation under distribution shift) are manifestations of causal blindness, each admitting a principled statistical remedy. Trustworthy AI is, at its core, a problem of causal statistics. The statistical community is not merely equipped to solve it -- it is the only community with the foundational tools to do so rigorously.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Machine Learning

2605.24076

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Add feedback

Understanding and Improving Feature Learning for Out-of-Distribution Generalization

Neural Information Processing SystemsApr-29-2026, 22:43:17 GMT

A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. Despite the contradictions at first glance, we theoretically show that ERM essentially learns both spurious and invariant features, while ERM tends to learn spurious features faster if the spurious correlation is stronger. Moreover, when fed the ERM learned features to the OOD objectives, the invariant feature learning quality significantly affects the final OOD performance, as OOD objectives rarely learn new features. Therefore, ERM feature learning can be a bottleneck to OOD generalization. To alleviate the reliance, we propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization. FeAT iteratively augments the model to learn new features while retaining the already learned features. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FeAT effectively learns richer features thus boosting the performance of various OOD objectives1.

artificial intelligence, generalization, machine learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

When Does Group Invariant Learning Survive Spurious Correlations? Yimeng Chen1,2, Ruibin Xiong3, Zhiming Ma1,2, Yanyan Lan4,5

Neural Information Processing SystemsApr-25-2026, 07:42:10 GMT

By inferring latent groups in the training data, recent works introduce invariant learning to the case where environment annotations are unavailable. Typically, learning group invariance under a majority/minority split is empirically shown to be effective in improving out-of-distribution generalization on many datasets. However, theoretical guarantee for these methods on learning invariant mechanisms is lacking. In this paper, we reveal the insufficiency of existing group invariant learning methods in preventing classifiers from depending on spurious correlations in the training set. Specifically, we propose two criteria on judging such sufficiency. Theoretically and empirically, we show that existing methods can violate both criteria and thus fail in generalizing to spurious correlation shifts. Motivated by this, we design a new group invariant learning method, which constructs groups with statistical independence tests, and reweights samples by group label proportion to meet the criteria. Experiments on both synthetic and real data demonstrate that the new method significantly outperforms existing group invariant learning methods in generalizing to spurious correlation shifts1.

artificial intelligence, machine learning, spurious correlation, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

1c336b8080f82bcc2cd2499b4c57261d-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 23:34:08 GMT

artificial intelligence, inv, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
North America > United States (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

Neural Information Processing SystemsApr-24-2026, 23:34:04 GMT

The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.

artificial intelligence, invariant feature, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

0b5eb45a22ff33956c043dd271f244ea-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 12:33:02 GMT

artificial intelligence, machine learning, training environment, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Robust Learning with Progressive Data Expansion Against Spurious Correlation

Neural Information Processing SystemsApr-24-2026, 07:16:26 GMT

While deep learning models have shown remarkable performance in various tasks, they are susceptible to learning non-generalizable spurious features rather than the core features that are genuinely correlated to the true label. In this paper, beyond existing analyses of linear models, we theoretically examine the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. In light of this, we propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance. PDE begins with a group-balanced subset of training data and progressively expands it to facilitate the learning of the core features. Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as ResNets and Transformers. On average, our method achieves a 2.8%improvement in worst-group accuracy compared with the state-of-the-art method, while enjoying up to 10 faster training efficiency. Codes are available at https://github.com/uclaml/PDE.

artificial intelligence, machine learning, spurious feature, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

spurious feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

The Role of Causal Features in Strategic Classification for Robustness and Alignment

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

Understanding and Improving Feature Learning for Out-of-Distribution Generalization

When Does Group Invariant Learning Survive Spurious Correlations? Yimeng Chen1,2, Ruibin Xiong3, Zhiming Ma1,2, Yanyan Lan4,5

1c336b8080f82bcc2cd2499b4c57261d-Supplemental.pdf

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

118bd558033a1016fcc82560c65cca5f-Supplemental.pdf

0b5eb45a22ff33956c043dd271f244ea-Paper-Conference.pdf

0506ad3d1bcc8398a920db9340f27fe4-Supplemental-Conference.pdf

Robust Learning with Progressive Data Expansion Against Spurious Correlation